Search CORE

10 research outputs found

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Author: Abdulmumin Idris
Ahmad Ibrahim Said
Bojar Ondřej
Bose Aneesh
Kakudi Habeebah Adamu
Kohli Guneet Singh
Kotwal Ketan
Muhammad Shamsuddeen Hassan
Parida Shantipriya
Sarkar Sayan Deb
Publication venue
Publication date: 28/05/2023
Field of study

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.Comment: Accepted at ACL 2023 as a long paper (Findings

arXiv.org e-Print Archive

EFaR 2023: Efficient Face Recognition Competition

This paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023). The competition received 17 submissions from 6 different teams. To drive further development of efficient face recognition models, the submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size. The evaluation of submissions is extended to bias, cross-quality, and large-scale recognition benchmarks. Overall, the paper gives an overview of the achieved performance values of the submitted solutions as well as a diverse set of baselines. The submitted solutions use small, efficient network architectures to reduce the computational cost, some solutions apply model quantization. An outlook on possible techniques that are underrepresented in current solutions is given as well.Comment: Accepted at IJCB 202

arXiv.org e-Print Archive

CNN Patch Pooling for Detecting 3D Mask Presentation Attacks in NIR

Author: Kotwal Ketan
Marcel Sébastien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/04/2021
Field of study

Presentation attacks using 3D masks pose a serious threat to face recognition systems. Automatic detection of these attacks is challenging due to hyper-realistic nature of masks. In this work, we consider presentations acquired in near infrared (NIR) imaging channel for detection of mask-based attacks. We propose a patch pooling mechanism to learn complex textural features from lower layers of a convolutional neural network CNN). The proposed patch pooling layer can be used in conjunction with a pretrained face recognition CNN without fine-tuning or adaptation. The pretrained CNN, in fact, can also be trained from visual spectrum data. We demonstrate efficacy of the proposed method on mask attacks in NIR channel from WMCA and MLFP datasets. It achieves near perfect results on WMCA data, and outperforms existing benchmark on MLFP dataset by a large margin

Infoscience - École polytechnique fédérale de Lausanne

CNN Patch Pooling for Detecting 3D Mask Presentation Attacks in NIR

Author: Kotwal Ketan
Marcel Sébastien
Publication venue: Idiap
Publication date: 27/05/2020
Field of study

Presentation attacks using 3D masks pose a serious threat to face recognition systems. Automatic detection of these attacks is challenging due to hyper-realistic nature of masks. In this work, we consider presentations acquired in near infrared (NIR) imaging channel for detection of mask-based attacks. We propose a patch pooling mechanism to learn complex textural features from lower layers of a convolutional neural network (CNN). The proposed patch pooling layer can be used in conjunction with a pretrained face recognition CNN without fine-tuning or adaptation. The pretrained CNN, in fact, can also be trained from visual spectrum data. We demonstrate efficacy of the proposed method on mask attacks in NIR channel from WMCA and MLFP datasets. It achieves near perfect results on WMCA data, and outperforms existing benchmark on MLFP dataset by a large margin

Infoscience - École polytechnique fédérale de Lausanne

Hyperspectral Image Fusion

Author: Subhasis Chaudhuri Ketan Kotwal
Publication venue: Springer New York
Publication date
Field of study

Open Library

Visualization of Hyperspectral Images Using Bilateral Filtering

Author: Ketan Kotwal
Subhasis Chaudhuri
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Multispectral Deep Embeddings As a Countermeasure To Custom Silicone Mask Presentation Attacks

Author: Bhattacharjee Sushil
Kotwal Ketan
Marcel Sébastien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/09/2019
Field of study

This work focuses on detecting presentation attacks (PA) mounted using custom silicone masks. Face recognition (FR) systems have been shown to be highly vulnerable to PAs based on such masks [1, 2]. Here we explore the use of multispectral data (color imagery, near infrared (NIR) imagery and thermal imagery) for face presentation attack detection (PAD), specifically against the custom silicone mask attacks. Using a new dataset (XCSMAD) representing 21 custom made masks, we establish the baseline performance of several commonly used face-PAD methods, on the different imaging channels. Considering thermal imagery in particular, our experiments show that low-cost thermal imaging devices are as effective in face-PAD as more expensive thermal cameras, for mask-based attacks. This result reinforces the case for the use of thermal data in face-PAD. We also demonstrate that fusing information from multiple channels leads to significant improvement in face-PAD performance. Finally, we propose a new approach to face-PAD of custom silicone masks using a convolutional neural network (CNN). On individual spectral channels, the proposed approach achieves state-of-the-art results. Using multispectral-fusion, the proposed CNN-based method significantly outperforms the baseline methods. The new dataset and source-code for our experiments is freely available for research purposes

Infoscience - École polytechnique fédérale de Lausanne

Detection of Age-Induced Makeup Attacks on Face Recognition Systems Using Multi-Layer Deep Features

Author: Kotwal Ketan
Marcel Sébastien
Mostaani Zohreh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/11/2019
Field of study

Makeup is a simple and easy instrument that can alter the appearance of a person’s face, and hence, create a presentation attack on face recognition (FR) systems. These attacks, especially the ones mimicking ageing, are difficult to detect due to their close resemblance with genuine (non-makeup) appearances. Makeups can also degrade the performance of recognition systems and of various algorithms that use human face as an input. The detection of facial makeups is an effective prohibitory measure to minimize these problems. This work proposes a deep learning-based presentation attack detection (PAD) method to identify facial makeups. We propose the use of a convolutional neural network (CNN) to extract features that can distinguish between presentations with age-induced facial makeups (attacks), and those without makeup (bona-fide). These feature descriptors, based on shape and texture cues, are constructed from multiple intermediate layers of a CNN. We introduce a new dataset AIM (Age Induced Makeups) consisting of 200+ video presentations of old-age makeups and bona-fide, each. Our experiments indicate makeups in AIM result in 14% decrease in the median matching scores of a recent CNN-based FR system. We demonstrate accuracy of the proposed PAD method where 93% presentations in the AIM dataset are correctly classified. In additional testing, it also outperforms existing methods of detection of generic makeups. A simple score-level fusion, performed on the classification scores of shape- and texture-based features, can further improve the accuracy of the proposed makeup detector

Infoscience - École polytechnique fédérale de Lausanne

Hyperspectral Image Fusion

Author: Kotwal Ketan.
SpringerLink (Online service)
Publication venue
Publication date
Field of study

XVI, 191 p. 42 illus.online resource

uilis.unsyiah.ac.id

Bengali Visual Genome 1.0

Author: Bojar Ondřej
Dash Satya Ranjan
Kotwal Ketan
Panda Subhadarshi
Parida Shantipriya
Sen Arghyadeep
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Data ------- Bengali Visual Genome (BVG for short) 1.0 has similar goals as Hindi Visual Genome (HVG) 1.1: to support the Bengali language. Bengali Visual Genome 1.0 is the multi-modal dataset in Bengali for machine translation and image captioning. Bengali Visual Genome is a multimodal dataset consisting of text and images suitable for English-to-Bengali multimodal machine translation tasks and multimodal research. We follow the same selection of short English segments (captions) and the associated images from Visual Genome as HGV 1.1 has. For BVG, we manually translated these captions from English to Bengali taking the associated images into account. The manual translation is performed by the native Bengali speakers without referring to any machine translation system. The training set contains 29K segments. Further 1K and 1.6K segments are provided in development and test sets, respectively, which follow the same (random) sampling from the original Hindi Visual Genome. A third test set is called the ``challenge test set'' and consists of 1.4K segments. The challenge test set was created for the WAT2019 multi-modal task by searching for (particularly) ambiguous English words based on the embedding similarity and manually selecting those where the image helps to resolve the ambiguity. The surrounding words in the sentence however also often include sufficient cues to identify the correct meaning of the ambiguous word. Dataset Formats --------------- The multimodal dataset contains both text and images. The text parts of the dataset (train and test sets) are in simple tab-delimited plain text files. All the text files have seven columns as follows: Column1 - image_id Column2 - X Column3 - Y Column4 - Width Column5 - Height Column6 - English Text Column7 - Bengali Text The image part contains the full images with the corresponding image_id as the file name. The X, Y, Width and Height columns indicate the rectangular region in the image described by the caption. Data Statistics --------------- The statistics of the current release are given below. Parallel Corpus Statistics -------------------------- Dataset Segments English Words Bengali Words ---------- -------- ------------- ------------- Train 28930 143115 113978 Dev 998 4922 3936 Test 1595 7853 6408 Challenge Test 1400 8186 6657 ---------- -------- ------------- ------------- Total 32923 164076 130979 The word counts are approximate, prior to tokenization. Citation -------- If you use this corpus, please cite the following paper: @inproceedings{hindi-visual-genome:2022, title= "{Bengali Visual Genome: A Multimodal Dataset for Machine Translation and Image Captioning}", author={Sen, Arghyadeep and Parida, Shantipriya and Kotwal, Ketan and Panda, Subhadarshi and Bojar, Ond{\v{r}}ej and Dash, Satya Ranjan}, editor={Satapathy, Suresh Chandra and Peer, Peter and Tang, Jinshan and Bhateja, Vikrant and Ghosh, Anumoy}, booktitle= {Intelligent Data Engineering and Analytics}, publisher= {Springer Nature Singapore}, address= {Singapore}, pages = {63--70}, isbn = {978-981-16-6624-7}, doi = {10.1007/978-981-16-6624-7_7},

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University